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Abstract —There are many practical applications that require 
simplification of polylines. Some of the goals are to reduce the 
amount of information necessary to store, improve processing 
time, or simplify editing. The simplification is usually done by 
removing some of the vertices, making the resultant polyline go 
through a subset of the source polyline vertices. However, such 
approaches do not necessarily produce a new polyline with the 
minimum number of vertices. The approximate solution to find a 
polyline, within a specified tolerance, with the minimum number 
of vertices is described in this paper. 

Index Terms —polyline compression; polyline approximation; 
orthogonality; circular arcs 

1. Introduction 

The task is to find a polyline, within a specified tolerance 
of the source polyline, with the minimum number of vertices. 
That polyline is called optimal. Usually, a subset of vertices of 
the source polyline is used to construct an optimal polyline |[T], 
0. However, an optimal polyline does not necessarily have 
vertices coincident with the source polyline vertices. One 
approach, to allow the resultant polyline to have fiexibility 
in the locations of vertices, is to find the intersection be¬ 
tween adjacent straight lines Q or geometrical primitives 0. 
However, there are situations when such an approach does 
not work well, for example, when adjacent straight lines are 
almost parallel to each other or a circular arc is close to being 
tangent to a straight segment. The approach described in this 
paper evaluates a set of vertex locations (considered locations) 
while searching for a polyline with the minimum number of 
vertices. 

H. Algorithm 
A. Discretization of the Solution 

Any compressed polyline must be within tolerance of the 
source polyline; therefore, the compressed polyline must have 
vertices within tolerance of the source polyline. It would 
be very difficult to consider all possible polylines and find 
one with the minimum number of vertices; therefore, as an 
approximation, only some locations around vertices of the 



Fig. I. Example of one segment (red segment) between considered locations 
(black dots) within tolerance of the source polyline (blue polyline). 


source polyline are considered (see the black points around 
the vertices of the source polyline in Eig. [^. 

The locations around vertices of the source polyline are 
chosen to be on an infinite equilateral triangular grid with 
the distance from vertices of the source polyline less than the 
specified tolerance. The equilateral triangular grid (see Eig. ^ 
has the lowest number of nodes versus other grids (square, 
hexagonal, etc.), satisfying that distance from any point to the 
closest node does not exceed the specified threshold. 
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Fig. 2. The worst case distance for the equilateral triangular grid is the 
distance from the center of the triangle O to any vertex of the equilateral 
triangle. If OA = OB = OC = 1, then AB = BC = CA = Vs. 

The choice for the side of an equilateral triangle in the 
equilateral triangular grid is calculated from the error it 
introduces. That error can be expressed as a proportion of 
the specified tolerance. Eor example, q G (0,1) proportion of 
the specified tolerance means that the side of the equilateral 

triangle is equal to qVs times the specified tolerance. This 
27r 1.2 

leads to about —p^ ^ locations per each vertex. To 
3v3g^ Q 

decrease complexity, some locations might be skipped; if they 
are considered in neighbor vertices of the source polyline, 
however, it should be done without breaking the combinatorial 
algorithm described in section |II-E[ If tolerance is great, 
it is possible to consider locations around segments of the 
source polyline. In this paper, to support any tolerance, only 
locations around vertices of the source polyline are considered. 
Densification of the source polyline might be necessary to find 
the polyline with the minimum number of vertices. 

B. Testing a Segment to Satisfy Tolerance 

Eor a compressed polyline to be within tolerance, every 
segment of the compressed polyline must be within tolerance 
from the part of the source polyline it describes. To find the 
compressed polyline with the minimum number of vertices, 
this test has to be performed many times for all combinations 










of possible locations of vertices (see Fig. [T). 0 describes an 
efficient approach to perform these tests based on the convex 
hull. If the convex hull is stored as a polygon, the complexity 
of this task is O(logn), where n is the number of vertices in 
the convex hull Q. The expected complexity of the convex 
hull for the N random points in any rectangle is 0{\ogN), 
see 0. If the source polyline has parts close to an arc, the size 
of the convex hull tends to increase. For the worst case, the 
number of vertices in the convex hull is equal to the number 
of vertices in the original set. 

If there are no lines with thickness of two tolerances 
covering the convex hull completely, then one segment cannot 
describe this part of the source polyline. The complexity of 
this check is O(nlogn). 

A convex hull for any part of the source polyline is 
constructed in the same way as in Q. 


C. Testing Segment End Points 

The test described in the previous section |II-B| does not 
check the ends of the segment. The example in Fig. shows 
that the source polyline changes direction to the opposite 
several times (zigzag) before going up. Without checking end 
points and changes in direction, the compressed polyline might 
not describe some parts of the source polyline (Fig. 1^). 
Therefore, these tests are necessary to guarantee that the 
compressed polyline (Fig. [^) describes the source polyline 
without missing any parts. 



Fig. 3. The blue polyline is the source polyline. The red polyline is the 
result of the algorithm without checking for end points and the source polyline 
direction (a) and with both checks performed (b). 


The segment end points to be within the tolerance of the 
part of the source polyline are tested based on the convex 
hull in the same way as the test for the segment to be within 


tolerance performed in section II-B 


This is equivalent to the test if the segment extended in 
parallel and perpendicular directions by the tolerance (see 
Fig.0 contains a convex hull of the part of the source polyline 
it describes. If more directions are used, a better approximation 
of the curved polygon can be obtained. The complexity of the 
test is O(logn). 



Fig. 4. The diagonal striped area is the tolerance area around the segment. 
The thin rectangle is the approximation of the area around the segment. A 
thick polygon would be a better approximation. 


D. Testing Polyline Direction 

The test for the source polyline to have a zigzag is 
performed by checking if the projection to the segment of 
backward movement exceeds two tolerances (2T, where T is 
the tolerance). Two tolerances are used because one vertex of 
the source polyline can shift forward by the tolerance and 
the vertex after that shift backward by the tolerance. The 
algorithm is based on analyzing zigzags before the processed 
point. Let pi be the vertices of the polyline, i = O..A^ — 1, 
be the number of vertices in the polyline. The next algorithm 
constructs a table for efficient testing. 

27r 

Define a set of directions aj = 

where j = ^..Nd — 1, Nd is the number of directions. 
Cycle over each direction aj, j = {)..Nd — 1. 

Define the priority queue with requests containing two 
numbers. The first number is the real value, and the 
second number is the index. Priority of the request is 
equal to the first number. 

Set /c = 0. 

Cycle over each point pi of the source polyline, 
i = 0..N -1. 

Calculate projection of pi to the direction aj (scalar 
product between the point and the direction vector): 

d = Pi ' (cos (aj ), sin (aj )). 

Remove all requests from the priority queue with 
a priority of more than d + 2T. If the largest index 
from removed requests is larger than /c, set /c equal 
to that index. 

Set Vj^i = k. 

Add request (d, i + 1) to the priority queue. 

To test if the part of the source polyline between vertices 
is and ie has a zigzag. 

First, find the closest direction aj to the direction of the 
segment aj*: j* = round where a is 

the direction of the segment. 

Second, if < is, then there are no zigzags for the 

segment describing the part of the source polyline from 
vertex is till ie- 

Let Wi = min (Va). If is < Wi , then one segment 

0<jAj<Nd ' ' 

cannot describe the part of the source polyline from vertex is 
till ie. 

This test has some limitations: 

• The tested direction is approximated by the closest one, 
making the check approximate. 

• For some error models, a zigzag might pass the test. For 
example, if errors are limited by a circle, a zigzag by two 
tolerances is only possible if it happens directly on the 
segment. 

Nevertheless, it is an efficient test to avoid absurd re¬ 
sults, like in Fig. The complexity of the algorithm is 
0{NdN log (N)) and the complexity to test any segment is 
0 ( 1 ). 

















E. Combinatorial Approach to Find an Optimal Solution 

The optimal solution is found by using the algorithm 
described in 

Let pij be considered locations for vertex pi, where 
i = 0..N — 1, j = O..Ni — 1, Ni is the number of con¬ 
sidered locations for the vertex i. Let pairs 
k = 0..m, divide the source polyline into m straight segments 
{Pik,jk^Pik+i,jk+ i) describ ing the source polyline from vertex 
ik till i/c+i, k = 0..m — 1. Notice that neighbor segments are 
already connected in k = l..m — 1, and this solution 

avoids problems in algorithms 0,0 when the intersection 
of neighbor segments is far away from the source polyline. 

The goal of this algorithm is to find the solution with 
the minimum number of vertices while satisfying tolerance 
restriction, and among them with the minimum integral square 
differences. Therefore, minimization is performed in two parts 

iT*] 

\ L number of segments, 

and the second part is the integral of the square deviation 
between segments and the source polyline. The solutions 
are compared by the number of segments and, if they have 
the same number of segments, by square deviation between 
segments and the source polyline. The solution of this task, 
when the optimal polyline has vertices coincident with the 
source polyline, can be found in Q. 

Let P/c, k = 0..N — 1 be parts of the source polyline from 
vertex 0 to /c. 

The optimal solution is found by induction. Define the 


optimal solution for polyline Pq as 
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Tk',j' +e 


{k',j'),{k,j) ^ 
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to Store {k\j'} when the right part is minimal. 


The optimal solution is reconstructed from 


mm 


'-N-l.j 


by recurrently using stored {k\j'} values. 


F. Optimization 

It is possible to significantly reduce the complexity of the 


algorithm described in the previous section |II-E| by using the 
approach described in j^. 
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mm 

ki <k' /\k' < k2 
0 < j' A f < Nk' 
c\iec\{{k',j') ,{k,j)) 

min 





( 1 ) 


ki < k' A k' < k2 
0 < / A f < Ny 

where 

Ak^) 

^{k’,j'),{k,j) 

= min 

0 < j2 A j2 < Nk^ 

check , (^ 2 ,^ 2 )) 

check ((fc 2 ,i 2 ),(fc,i)) 

From Q, it follows that 
min 

ki < k' A k' < k2 \ 
0 < j' A f < Nk' 
check {{k', f) ,{k,j)) 


Tk>j' +e 


(fe2) 

ik',j'),{k,j) . 


{^(k' ,j'),(k2,j2) + ^ik2,j2),ik,j)) ‘ 


rp# I 1 


. -^k'J' 




■ ^(k',j'),(k,j) 



( 2 ) 


> 


mm 


j = 0, A^o — 1- For k = 1, — 1, construct the optimal 

solution for Pk from optimal solutions for P/./, k' = 0..k — 1. 


0 < j2 A j2 < Nk^ 
check {{k 2 j 2 ), (kj)) 


r*A 

= mm 

^k,j J 0 < k' A k' < k 

0 < j' A j' < Nk' 
check {{k'J') ,{kj)) 


where ^{k'j'),{k,j) is the integral square difference between 
segment {pk',j'^Pkj) and the source polyline from vertex 
k' till k, check {{k\ j'), {k, j)) is a combination of checks 
described in the previous sections p^I-B II-C and II-D to check 
if segment {pk'j'^Pkj) can describe the part of the source 
polyline from vertex k' till k. 

To reconstruct the optimal solution, it is necessary for 


and 


mm 

ki < k' A k' < k2 
0 <j' A j' < Nk' 
check {{k'J') ,{kj)) 


k2,j2 


Tt.r +1 



> 


mm 

' o<iiAii<Arfc^ 


+ 
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mm 


(^(/C2,i2),(/c,i)) 


0 < j2 A j2 < Nk^ 

check((fe, j2), 

The maximum of ^ and ^ can be used to skip checking 
combinations between vertex ki and k 2 . 

The inequalities @ and ^ are approximate due to the 
use of considered locations. However, this allows finding 
stricter limitations for the solution inside the interval and 
simultaneously finding the solution for breaking at vertex k 2 . 

It is possible to construct ^ and ^ with exact inequalities 
by constructing the optimal solution when the end point is 




















not required to end in the considered location. Similarly, the 
part from vertex k 2 to {k,j) should not be required to end 
in considered locations for vertex k 2 . This is useful when the 
resultant polyline is required to go through the vertices of 
the source polyline. However, such an algorithm has a worse 
compression ratio than the one with the flexibility in joints. 

See paper |[^ for further details of this algorithm. 

G. Optimal Compression of Closed Polylines 

To And the optimal compression of a closed polyline, it is 
necessary to know the starting vertex. It is also necessary that 
the resultant polyline starts and ends in the same vertex. The 
next algorithm will be used to And the starting vertex and 
construct a closed resultant polyline. 

1. Construct a convex hull for all vertices of the source 
polyline. 

2. Find the smallest angle of the convex hull polygon. 

3. Take the vertex corresponding to the smallest angle as 
the starting vertex and reorient the closed polyline to start 
from that vertex. 

4. Apply the algorithm. 

5. From the constructed solution, take one vertex in the 
middle as the new starting vertex and reorient the closed 
polyline to start from that vertex. 

6. Apply the algorithm once more, while for the first and 
the last vertex consider only the location of the previous 
solution for the middle vertex. 

Steps and are important for a small closed polyline. 
For the small closed polyline, the resultant polyline is within 
tolerance of the source polyline, even with suboptimal orien¬ 
tation. As a consequence, without these steps, step may not 
And the optimal division of the source polyline, leading to a 
suboptimal solution. 


H. Optimal Compression by Straight Segments and Arcs 

This algorithm is extendible to support arcs. The arc passing 
through considered locations differs from the segment by the 
necessity to deflne the radius. Unfortunately, it adds signiflcant 
complexity to the algorithm. Nevertheless, such an algorithm 
is possible. There are different ways to At an arc to a 
polyline: minimum integral square differences of squares 
0, minimum integral square differences |T0|-fT4]|, minimum 
deviation, etc. Algorithms with complexity 0(n), where n is 
the number of vertices in the fltted polyline, are not suitable 
due to the signiflcant increase in complexity. The algorithms 
with acceptable complexity 0(1) are 0 , |T3), |T4); 
however, algorithms based on integral square differences of 
squares 0,0 might break for small arcs and, therefore, are 
not suitable. Checking that the part of the source polyline 
is within tolerance, end points, and zigzag will be time- 
consuming due to complexity 0(n). 


III. Analysis of the Algorithm Complexity 
The algorithm contains three steps: 


1. Preprocessing: construction of convex hulls (section II-B) 
and Ailing arrays for an efflcient zigzag test (sectioning). 


2. Construction of the optimal solution (section II-E). 

3. Reconstruction of the optimal solution (section ]TFE ). 

A signiflcant amount of time is spent on constructing an 
optimal solution. It is difficult to evaluate the complexity 


described in section II-F however, the worst complexity is 


o(n‘^- max iN‘f)-XogiN)]. (4) 

The complexity of the algorithm depends on the type of 
polyline it processes. It is very difficult to conclude what is the 
practical complexity of this algorithm. If the optimal polyline 
does not have segments describing too many vertices of the 
source polyline, 0 tends to be 


0{N- max (Nf) 

0<iAi<N ^ ^ 


( 5 ) 


Fig. shows how much time it takes to process a polyline 
depending on the number of vertices. The dependence is very 
close to linear, supporting 
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Fig. 5. Time needed to process a polyline versus the number of vertices. The 
time is measured in CPU ticks on the processor Intel Xeon CPU E5-2670. The 
polylines are generated by the Brownian motion process. Each next vertex 
is randomly incremented from the previous vertex by random vector, with 
components normally distributed with zero mean and 0.25 standard deviation. 
The tolerance was set to one. The average reduction in the number of vertices 
is about 50 times. 


IV. Examples 

Fig. shows an example of the algorithm described in this 
paper. If the source polyline is the noisy version of a ground 
truth polyline, where the noise does not exceed some thresh¬ 
old, and the algorithm is provided with a tolerance slightly 
greater than the threshold to account for approximations inside 
the algorithm, then the resultant polyline will never have more 
vertices than the ground truth polyline. 

The effectiveness of the approach is shown in Fig. [ 7 ] Nine 
segments are sufficient to represent the arc with specifled 
precision. The algorithm not only optimizes the number of 
segments, it also flnds the locations of the segments that 
minimize integral square differences. Therefore, as shown in 
Fig. the algorithm tends to construct segments similar in 
length. 

Fig. shows the dependence from the error introduced by 
a discrete set of considered locations (see section |II-A| ) to 
the efficiency of the compression. Flexibility in places where 
neighboring segments connect each other is very important to 
reach maximum compression, especially for noisy data. 

























Fig. 6. Comparison of the result of approximation of the Douglas-Peucker 
algorithm (b) and approximation of optimal polyline compression (c). The 
green polyline is a ground truth. The red polyline is the source polyline 
(a), the result of the Douglas-Peucker algorithm 0 (b), and the result of 
optimal polyline compression (c). The black dots around vertices of the source 
polyline are considered locations for the vertices of the compressed polyline. 
The vertices of the source polyline are deviated from the segments of the 
ground truth polyline by random values uniformly distributed in the interval 
(- 0 . 1 , 0 .!). 



Fig. 7. The black polyline is the source polyline. The red circles are the 
vertices of the optimal polyline. Ground truth is the arc of 90°. The noise has 
uniform distribution in the circle of one percent of the arc radius. 


V. Optimal Compression by Orthogonal 
Directions 

The triangular grid for considered locations supports direc¬ 
tions by 30°. Reconstruction of orthogonal buildings requires 
support for 90° (L) and sometimes 45°. The square grid for 
considered locations is more appropriate for this task. 

Notice that because only certain directions are allowed, 
only segments between pairs of considered locations aligned 
by these directions may be parts of the resultant polyline. 
Suppose that the resultant segment goes between vertex i and 
j. Because it has to be within tolerance for all vertices between 
i and j, it goes through their considered locations (with the 
exception of the segment deviating close to the tolerance due 
to discretization of considered locations). 



Fig. 8. The number of segments versus discretization error. The polyline 
was generated by the Brownian motion process in the same way as in Fig. 0 
with 10, 000 vertices. 


The optimal solution is found by induction. Define the 


optimal solution for polyline Pq as 


^o,i,g f _ 



, where 


j = 0, A^o — 1, Q = 0, M — 1, and M is the number of 
different directions. For orthogonal case M = 4, and for 45° 

35QO _ 

case M = 8. Take directions as ai = —• i, i = 0, M — 1. 

_ M 


For k = 1, N — 1, construct the optimal solution for from 
the optimal solution for Pk-i. 



min 

0 < i' A f < Nk_i 
0<q' A q' < M 
2\q'-q\j^M 
angle {pkj - Pk-ij', ctg ') 




were 5qi^q = 


1, if q' 7^ q, 

0, otherwise; 

angle (i;, a) is the check that the vector v has angle a (zero 
length vectors are allowed). 

The condition 2\q' — q\ ^ M corresponds to prohibiting 
changes in direction by 180°. 

For the 45° case, it is possible to restrict the resultant 
polyline from having sharp angles by not allowing a change 
of direction by 135° (|4 — ((g' — q) mod 8)| ^ 1). 

Notice that there are no checks for the tolerance, direction, 
and end points because they are satisfied during each induction 
step. 

Analyzing the previous solution along M direction will 
further reduce the amount of calculations. The total complexity 
of the algorithm is 


O AT- max (NA^M 

\ 0<iAi<N 

For some data, the algorithm may produce an improper 
result. This happens when the introduction of a zero length 
segment lowers the penalty. 

Because the correct orientation is not known in advance, it 
is necessary to rotate polylines by different angles and take 
the solution with the lowest penalty fTS) see section 6]. 





























Fig. shows an example for the reconstruction of orthogo¬ 
nal buildings. 



Fig. 9. The black polylines are reconstructed buildings from lidar data p^ . 
The red polylines are the resultant orthogonal shapes. The blue polylines are 
the ground truth taken from Gz) 


The reconstruction of buildings with 45° sides are shown in 


Fig. 10 



Fig. 10. This differs from Fig.[^by the allowance of 45° segments. 


The main difference of the algorithm described in this 
section and eg is in the parameters. The specification of the 
tolerance is easier than the specification of the penalty A for 
each additional segment. 


VI. Conclusion 


This paper describes an approximation algorithm that finds a 
polyline with the minimum number of vertices while satisfying 
tolerance restriction. The solution is optimal with the following 
limitations: 


The vertices of the compressed polyline are limited to 
considered locations (section [Tl-A| ). 

The test that the vertex of the compressed polyline is 
located between some vertices of the source polyline is 
approximate due to the snapping of the breaking point 
(section |II-F|). 


The tests for end points (section II-C) and zigzags are 
approximate (section [TFD). 


The performance of the algorithm can be greatly improved 
if the number of considered locations is decreased without 
losing quality. This requires further research. 
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